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In the Claims : 

Claims 1-34 are pending in this application, and the status of each is listed 

below. 

1 . (Previously presented) A computer assisted method of auditing a 
superset of training data, the superset comprising examples of documents having one 
or more preexisting category assignments, the method including: 

partitioning the superset into at least two disjoint sets, including a test set and a 
training set, wherein the test set includes one or more test documents and the 
training set includes examples of documents belonging to at least two categories; 

automatically categorizing the test documents using the training set; 

calculating a metric of confidence based on results of the categorizing step and 
comparing the automatic category assignments for the test documents to the 
preexisting category assignments; and 

reporting the test documents and pree)dsting category assignments that are 
suspicious and the automatic category assignments that appear to be missing from 
the test documents, based on the metric of confidence. 

2. (Original) The method of claim 1 , further including repeating the 
partitioning, categorizing and calculating steps until at least one-half of the 
documents in the superset have been assigned to the test set. 

3. (Original) The method of claim 2, wherein the test set created in the 
partition step has a single test document. 

4. (Original) The method of claim 2, wherein the test set created in the 
partition step has a plurality of test documents. 

5. (Original) The method of claim 1 , further including repeating the 
partitioning, categorizing and calculating steps until substantially all of the documents in 
the superset have been assigned to the test set. 
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6. (Original) The method of claim 1 , wherein the partitioning, categorizing 
and calculating steps are carried out substantially without user intervention. 

7. (Original) The method of daim 5, wherein the partitioning, categorizing 
and calculating steps are carried out substantially without user intervention. 

8. (Original) The method of claim 1, wherein the partitioning, categorizing, 
calculating and reporting steps are canried out substantially without user inten^ention. 

9. (Original) The method of claim 5, wherein the partitioning, categorizing, 
calculating and reporting steps are carried out substantially without user intervention. 

1 0. (Original) The method of claim 1 , wherein the categorizing step includes 
determining k nearest neighbors of the test documents and the calculating step is based 
on a k nearest neighbors categorization logic. 

1 1 . (Previously presented) The method of claim 1 a, wherein the metric of 
confidence is an unweighted measure of distance between a particular test document 
and the examples of documents belonging to various categories. 

12. (Previously presented) The method of claim 1 1 , where the unweighted 
measure includes application of a relationship Sl^id^.T^) = ^ 5(dtyd), wherein 

£2o is a function of the particular test document represented by the a feature 
vector dt and of various categories 7^; and 

s Is a metric of distance between the particular test document feature vector dt 
and certain sample documents represented by feature vectors d, the certain 
sample documents being among a set of k nearest neighbors of the particular 
test document having category assignments to the various categories 7^. 

13. (Previously presented) The mefriod of claim 10. wherein the metric of 
confidence is a weighted measure of distance between a partteular test document and 
the examples of documents belonging to various categories, the weighted measure 
taking into account the density of a neighboriiood of the test document. 

14. (Previously presented) The method of claim 13, wherein the weighted 
measure includes application of a relationship 
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ai(dt>7;) = ^-^^^y^";, , ^ , wherein 

Qi is a function of the test document represented by the a feature vector dt and 
of various categories Tm\ and 

s is a metric of distance between the test document feature vector dt and certain 
sample documents represented by feature vectors di and dz, the certain sample 
documents di being among a set of k nearest neighbors of the test document 
having category assignments to the various categories Tm and the certain sample 
documents dz being among a set of k nearest neighbors of the test document. 

15. (Currently amended) The method of claim 1 , wherein the reporting step 
further includes filtering the reported test documents based on the metric of confidence. 

16. (Currently amended) The method of claim 15, wherein filtering further 
includes color coding the i dentifi e d reported test documents based on the metric of 
confidence. 

17. (Currently amended) The method of claim 15, wherein filtering further 
Includes selecting for display the idontif ie d reported test documents based on the metric 
of confidence. 

18. (Previously presented) The method of claim 1 , wherein reporting includes 
generating a printed report. 

1 9. (Previously presented) The method of claim 1 , wherein reporting includes 
generating a file conforming to XML syntax. 

20. (Previously presented) The method of claim 1 , wherein reporting includes 
generating a sorted display identifying at least a portion of the test documents. 

21 . (Currently amended) The method of claim 1 , further including calculating 
a precision score for the i dontifiod reported test documents. 

22. (Previously presented) A computer assisted method of auditing a 
superset of training data, the superset comprising examples of documents having one 
or more preexisting category assignments, the method including: 
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determining k nearest neighbors of the documents in a test subset automatically 
partitioned from the superset; 

automatically categorizing the documents based on the k nearest neighbors into 
a plurality of categories; 

calculating a metric of confidence based on results of the categorizing step and 
comparing the automatic category assignments for the documents to the 
preexisting category assignments; and 

reporting the documents in the test subset and preexisting category assignments 
that are suspicious and the automatic category assignments that appear to be 
missing from the documents in the test subset, based on the metric of 
confidence. 

23. (Currently amended) The method of claim 22, wherein the metric of 
confidence is an unweighted measure of distance between the a particular test 
document and the examples of documents belonging to various categories. 

24. (Original) The method of claim 23, where the unweighted measure 
includes application of a relationship Q^Cd^^r^) = ^ •s(d(,d), wherein 

£2o is a function of the test document represented by the a feature vector dt and 
of various categories Tm] and 

s is a metric of distance between the test document feature vector dt and certain 
sample documents represented by feature vectors d, the certain sample 
documents being among a set of k nearest neighbors of the test document 
having category assignments to the various categories Tm- 

25. (Currently amended) The method of claim 22, wherein the metric of 
confidence is a weighted measure of distance between the a particular test document 
and the examples of documents belonging to various categories, the weighted measure 
taking Into account the density of a neighborhood of the test document. 
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26. (Original) The method of claim 25, wherein the weighted measure 




Qt is a function of the test document represented by the a feature vector di and 
of various categories Tm\ and 

s is a metric of distance between the test document feature vector dt and certain 
sample documents represented by feature vectors di and d2, the certain sample 
documents di being among a set of k nearest neighbors of the test document 
having category assignments to the various categories Tm and the certain sample 
documents da being among a set of k nearest neighbors of the test document. 

27. (Original) The method of claim 22, wherein the detenmining, categorizing 
and calculating steps are carried out substantially without user Intervention. 

28. (Previously presented) The method of claim 22, wherein the reporting 
step further Includes filtering the documents based on the metric of confidence. 

29. (Previously presented) The method of claim 28, wherein the filtering step 
further includes color coding the reported documents based on the metric of confidence. 

30. (Previously presented) The method of claim 28, wherein the filtering step 
further includes selecting for display the reported documents based on the metric of 
confidence. 

31 . (Previously presented) The method of claim 22, wherein reporting 
includes generating a printed report. 

32. (Previously presented) The method of claim 22, wherein reporting 
includes generating a file conforming to XML syntax. 

33. (Previously presented) The method of claim 22, wherein reporting 
includes generating a sorted display identifying at least a portion of the documents. 

34. (Previously presented) The method of claim 22, further including 
calculating a precision score for the reported documents. 
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